Machine learning in epidemiology: Neural networks forecasting of monkeypox cases

This study integrates advanced machine learning techniques, namely Artificial Neural Networks, Long Short-Term Memory, and Gated Recurrent Unit models, to forecast monkeypox outbreaks in Canada, Spain, the USA, and Portugal. The research focuses on the effectiveness of these models in predicting the spread and severity of cases using data from June 3 to December 31, 2022, and evaluates them against test data from January 1 to February 7, 2023. The study highlights the potential of neural networks in epidemiology, especially concerning recent monkeypox outbreaks. It provides a comparative analysis of the models, emphasizing their capabilities in public health strategies. The research identifies optimal model configurations and underscores the efficiency of the Levenberg-Marquardt algorithm in training. The findings suggest that ANN models, particularly those with optimized Root Mean Squared Error, Mean Absolute Percentage Error, and the Coefficient of Determination values, are effective in infectious disease forecasting and can significantly enhance public health responses.


Introduction
The Monkeypox Virus (MPXV), a member of the Orthopoxvirus genus, is the causative agent of the infectious disease known as monkeypox.This virus is predominantly found in Central and West African countries, with sporadic cases reported in other regions, including the United States and the United Kingdom [1][2][3].
Transmission of MPXV to humans often occurs through direct contact with infected animals or contaminated objects, such as body fluids, sores, or bedding [1,2,4].Human-tohuman transmission is also possible, mainly through close physical interaction with infected individuals or exposure to their bodily fluids [1,4].Symptoms of MPXV infection include fever, headache, muscle aches, and a characteristic rash that spreads across the body [1,4].In severe cases, complications such as pneumonia, sepsis, and encephalitis can occur [1,2,4].
No specific antiviral treatment for MPXV currently exists; however, supportive care can aid in symptom management and reduction of complication risks [1,2,4].Vaccination against smallpox has shown some effectiveness in preventing monkeypox, but routine smallpox immunization is no longer practiced [1,2].Therefore, public health measures such as contact tracing, quarantine, and isolation are essential in controlling the spread of the disease [1,2,4].

Literature review
The study of infectious diseases, particularly emerging viruses like MPXV, has increasingly incorporated machine learning approaches to enhance prediction and management strategies.Key studies in this field have demonstrated the utility of various neural network models, such as ANN, LSTM, and GRU, in understanding and forecasting disease patterns [10][11][12][13][14][15].Our work builds upon these foundations, particularly focusing on recent developments in monkeypox forecasting.
Early detection and prediction of infectious diseases like MPXV are crucial for effective management and response.ANN approaches, as utilized in forecasting COVID-19 cases in Pakistan, provide valuable insights for healthcare professionals and policymakers [16].
ANN techniques are increasingly being used to predict patient outcomes in various diseases, including COVID-19, breast cancer, and cardiovascular disease.For example, ANN models were employed in assessing breast cancer risk among Iranian women [17].
While prior research like [9] has leveraged neural networks for predicting MPXV spread in specific regions, our study extends this application to Canada, Spain, the USA, and Portugal.This expansion is crucial, given the distinct epidemiological profiles and healthcare systems in these countries.Such comparative analysis contributes novel insights into the geographical variance in MPXV outbreak dynamics.
The role of machine learning in epidemiological modeling has evolved rapidly, with recent advances highlighting its potential in real-time disease surveillance and response planning.Studies have explored various machine learning techniques, including deep learning and predictive analytics, to enhance the accuracy of disease outbreak predictions and to understand transmission dynamics [16,[18][19][20].
Our study contributes to this growing body of literature by employing a combination of ANN, LSTM, and GRU models, enhanced with the ADAM optimizer [21] and the Levenberg-Marquardt learning algorithm [22].This approach not only allows for a comprehensive analysis of MPXV spread but also offers a methodological framework that can be adapted for other infectious diseases.The integration of advanced machine learning models in our research addresses a critical gap in current epidemiological studies.
The study utilizes a range of ANN models, including LSTM and GRU, to predict MPXV cases in the USA, Canada, Spain, and Portugal, based on existing datasets.The comparative analysis of these countries will assist healthcare authorities in formulating appropriate response strategies.This research is the first in-depth study using ANN on recent MPXV outbreaks, offering new insights into the epidemic's dynamics.Time series dataset of MPXV cases from each country, along with statistical graphs of confirmed cases, is presented [23].The distribution and geographical representation of confirmed Monkeypox cases across the studied nations are depicted in The prediction model uses data from the "Our World in Data" website, employing neural network, LSTM, and GRU models.The model's performance is enhanced using an Adaptive Moment Estimation (ADAM) optimizer [21].Additionally, a Levenberg-Marquardt (LM) learning algorithm is implemented for a single hidden layer ANN model, optimizing the number of neurons using the K-fold cross-validation early stopping validation approach [22].ANN-based regression models have been effective in predicting the spread of infectious diseases like MPXV.These models enable informed decision-making by healthcare professionals and policymakers in controlling disease spread and responding effectively to outbreaks.ANN models have been applied in various domains for time-series prediction, demonstrating their versatility and efficacy [10][11][12][13][14][15].
The remainder of this paper is organized as follows: The Methodology section discusses the methodology used in this study.The Results and Discussions section presents the findings of the research.Following that, the Forecasting Methodology section covers the approach taken for forecasting.The paper concludes with the Conclusion section, summarizing the study's key findings.

Methodology
In the manuscript, the choice of modeling methods, including ANN, LSTM, and GRU, is justified by their proven effectiveness in time-series analysis and epidemiological forecasting.ANN is renowned for its ability to model complex nonlinear relationships, making it ideal for predicting disease spread [24].LSTM and GRU, as advanced recurrent neural networks, effectively capture temporal dependencies in data, crucial for accurate disease trend predictions [25][26][27].These methodologies are selected for their ability to handle the intricacies and variabilities in infectious disease data, making them suitable for this study's purpose.The assumptions underlying these models are standard in the field and have been extensively validated in prior research, ensuring their applicability and reliability in this context.

• Data Representativeness:
The assumption that the datasets used are representative of the wider population and accurately reflect the trends in monkeypox cases.
• Stationarity of Data: The presumption that the underlying characteristics of the monkeypox data, such as trends and patterns, remain consistent over the period of study.This study employs a comparative approach, analyzing ANN, LSTM, and GRU models due to the lack of existing research focusing on the same countries and time period.These models were selected for their proven capabilities in time-series prediction and their adaptability to different data characteristics.The comparative analysis allows for a nuanced understanding of each model's strengths and weaknesses in predicting monkeypox outbreaks.
• Data Preprocessing and Normalization: The data underwent preprocessing to correct irregularities and ensure consistency.Normalization, crucial for neural network models, involved scaling input and target values to a [0, 1] range.This step minimizes biases and enhances model interpretability.
• Model Calibration: Model calibration involved fine-tuning hyperparameters for optimal performance.This process included adjusting learning rates, batch sizes, and layer configurations to enhance model accuracy and efficiency in data prediction.
• Validation Techniques: K-fold cross-validation was employed to ensure model robustness and avoid overfitting.This technique involved dividing the dataset into 'K' subsets and iteratively training and testing the model on these subsets, providing a comprehensive assessment of model performance.
• Performance Metrics: Statistical measures such as RMSE, MAE, and R-squared were utilized to evaluate model performance.These metrics provided quantitative insights into the model's prediction accuracy, reliability, and fit to the data.

The artificial neural network
ANN inspired in part by the neuronal architecture of the human brain, consist of simple processing units capable of handling scalar messages.Their extensive interconnection and adaptive interaction between units make ANNs a multi-processor computer system [28,29].ANNs offer a rapid and flexible approach to modeling, suitable for tasks such as rainfall-runoff prediction [30].The network comprises layers of interconnected neurons, where connection weights between one or more hidden layers connect the input and output layers [31].During training, the Back Propagation algorithm adjusts the network weights to reduce errors between the predicted and actual outputs [31].After training with experimental data to obtain the optimal structure and weights, ANNs undergo evaluation using additional experimental data for validation [31].The Multilayer Perceptron, a type of ANN with one or more hidden layers in the feed-forward network, is particularly prevalent [31].In ANNs, a node, a data structure, is connected in a network trained using standard methods like gradient descent [24,32,33].Each node in this memory or neural network has two active states (on or off) and one inactive state (off or 0), while each edge (synapse or link between nodes) carries a weight [34][35][36].Positive weights stimulate or activate the next inactive node, whereas negative weights inhibit or deactivate the subsequent active node [34,35,37].Each neuron in an ANN receives input from preceding neurons, with weights denoted as w pc .The weighted sum of each neuron's inputs is passed through a sigmoid function, represented by: Here, x i is the input to the i-th neuron in the preceding layer, j represents the current neuron, and n the number of neurons in the preceding layer.Similarly, weights w kj from neuron j to the subsequent neuron k are computed.The output y of the neural network for input x and true output t is derived by applying the activation function to the weighted sum of the previous layer's output: The quantity m represents the number of neurons in the preceding layer.The objective of training the neural network is to identify the weights w pc and w kj that minimize the error between the predicted output y k and the true output t.This involves minimizing the cost function E(w), the average squared difference between the predicted and actual output across training samples: Here, x n denotes the n-th input example, t n the corresponding true output, and N the total number of training examples.The factor 1  2 simplifies gradient calculation of the cost function during training.
The LM optimizer, a widely used type of ANN, was employed in this study for epidemic prediction [38,39].The ANN was trained on a dataset using the LM technique, optimizing the network by training with specific inner neurons [38,39].Performance was evaluated using the Root Mean Square Error (RMSE) and correlation coefficient to minimize the cost function value [38,39].

Levenberg-Marquardt
In numerical analysis, the LM algorithm is a renowned optimization technique for addressing nonlinear least squares problems.The LM method modifies the estimated Hessian matrix JTJ by incorporating a positive combination coefficient μ and an identity matrix I.This adjustment ensures the invertibility of the Hessian matrix, as expressed in: This approximation ensures that the diagonal components of the predicted Hessian matrix are greater than zero, consequently guaranteeing the invertibility of H [40,41].The LM algorithm employs a blend of the steepest descent and Gauss-Newton algorithms.When μ is close to zero, Eq (4) aligns with the Gauss-Newton method, while a large μ leads to the application of the steepest descent approach [42].
The update rule for the LM algorithm, represented in Eq (5), involves the weight vector V k +1 and the error vector e k : Eq (5) is also recognized as the Gauss-Newton procedure [40].

Adaptive moment estimation optimization
ADAM is a widely adopted optimization technique in deep learning, merging aspects of gradient descent with momentum and the Root Mean Square Propagation optimizer [21].ADAM aims to address the shortcomings of conventional optimization methods, such as sensitivity to step size and gradient noise, by adjusting the learning rate based on estimations of the gradients' first and second moments.The update rule for ADAM is given by: where � is a small constant to avoid division by zero, θ t denotes the weights at time step t, α is the learning rate, and mt and vt are the first and second-moment estimations of the gradients, respectively.
The first-moment estimation, mt , an exponential moving average of the gradients, is calculated as: where m t−1 is the previous first moment estimate, g t is the gradient at time step t, and β 1 is the decay rate hyperparameter for the first moment estimation.
The second-moment estimation, vt , involves the exponential moving average of squared gradients: where v t−1 represents the previous second moment estimate, and β 2 controls the decay rate of the second moment estimation.ADAM also incorporates bias correction in the moment estimates: with m t and v t being the adjusted first and second moment estimates, respectively [21].[26,43].The update gate equation is:

Gated recurrent unit
where σ is the sigmoid activation function, W z the weight matrix for the update gate, b z the bias vector, and [h t−1 , x t ] the concatenation of the previous hidden state and the current input.
The reset gate equation is: where σ is the sigmoid activation function, W r the reset gate's weight matrix, b r its bias vector, and [h t−1 , x t ] the combination of the previous hidden state and the current input.
The candidate state equation is: where � denotes element-wise multiplication, W h the weight matrix for the candidate state, b h its bias vector, and [r t � h t−1 , x t ] the amalgamation of the reset gate's product with the previous hidden state and the current input.GRU networks, with their selective information updating mechanism, offer enhanced efficiency and effectiveness compared to traditional RNNs.

Long short-term memory
LSTM networks, another variant of RNNs, are adept at learning long-term dependencies by selectively retaining or forgetting information over time through gating mechanisms.An LSTM network consists of three types of gates: the forget gate, input gate, and output gate.
The forget gate determines which information from the previous cell state to retain or discard for the current time step.It generates a vector of values between 0 and 1 for each number in the previous cell state and the current input.A value of 1 implies retention, while 0 indicates discarding.The forget gate equation is given by [33]: where f t is the forget gate's output at time t, σ the sigmoid activation function, W f the forget gate's weight matrix, h t−1 the previous hidden state, b f the bias term, and [�] signifies concatenation.
The input gate decides which information from the previous cell state and current input to add to the current cell state.It too generates a vector of values between 0 and 1.Values of 1 indicate addition, while 0 suggests ignoring.The input gate equation is also provided by [33]: where i t is the input gate's output at time t, σ the sigmoid activation function, W i the weight matrix for the input gate, h t−1 the previous hidden state, b i the bias term for the input gate, and [�] denotes concatenation.
The output gate determines which information from the current cell state should be output as the network's final output.It produces a vector of values, ranging from 0 to 1, for each cell state value.The final network output for the current time step is formed by multiplying these values by the current cell state.The equation for the output gate is provided by [33]: where o t is the output gate's output at time t, σ the sigmoid activation function, W o the weight matrix for the output gate, h t−1 the previous hidden state, b o the bias term for the output gate, and [�] indicates concatenation.

Control parameters for each model
The performance of neural network models such as ANN, LSTM, and GRU networks depends on several tunable hyperparameters.These parameters are crucial for the learning process and are optimized during training.

ANN model hyperparameters
• Weights and Biases: Weights (w ij and w kj ) are the core parameters adjusted during training.
They determine the strength of connections between neurons in successive layers.
• Number of Neurons in Each Layer: The size (n and m) of each layer, especially hidden layers, influences the network's capacity to learn complex patterns.
• Learning Algorithm: Back Propagation is used for adjusting weights, typically coupled with optimization techniques like the Levenberg-Marquardt (LM) optimizer.
• Activation Function: The sigmoid function is used for neuron activation, transforming the weighted sum into an output.
• Cost Function: E(w), the mean squared error between the predicted and actual outputs, is minimized during training.
• Performance Metrics: RMSE and correlation coefficients are used for evaluating model performance.

LSTM model hyperparameters
• Forget Gate Weights (W f ): Controls the amount of previous cell state to retain.
• Input Gate Weights (W i ): Determines what new information is added to the cell state.
• Output Gate Weights (W o ): Decides what information to output from the cell state.
• Bias terms (b f , b i , b o ): Offset values added to gate computations.
• Activation Functions: Typically sigmoid (σ) for gates and tanh for cell state updates.

GRU model hyperparameters
• Update Gate Weights (W z ): Balances the previous state and new candidate state contributions.
• Reset Gate Weights (W r ): Determines how much past information to forget.
• Candidate State Weights (W h ): Computes the potential new information to be added to the state.
• Bias terms (b z , b r , b h ): Offset values for each gate and candidate state computation.
• Activation Functions: Sigmoid (σ) for update and reset gates, and tanh for candidate state.
These hyperparameters are iteratively adjusted through backpropagation and optimization algorithms to minimize loss functions, thereby improving the predictive performance of the models.

K-fold cross validation
Overfitting is a common issue with ANN models, where the model tends to learn noise in the data rather than the actual signals, leading to poor performance on untested datasets.To mitigate this, K-fold cross-validation is employed as a robust method [44,45].In this technique, the data is randomly divided into K groups.The model undergoes training on (K-1) folds and is then evaluated on the remaining fold in each iteration, with RMSE serving as the performance metric.The learning process is monitored by plotting the number of epochs against the average RMSE on the validation folds.Training concludes when there is no significant reduction in RMSE with an increase in epochs [46].Once model training is completed, its performance is evaluated against a separate test dataset.This involves scaling the features after loading the dataset, followed by dividing it into 10 folds for the 10-fold cross-validation.This process iterates ten times, each time splitting the dataset into training and validation sets, training the model on the former, and assessing it on the latter.The model's performance is recorded in each iteration.The procedure progresses through each of the 10 folds until all have been evaluated.Finally, the average performance across all 10 folds is calculated and presented.This process terminates upon completion.
The method for determining the optimal number of hidden neurons in the ANN models is depicted in the flowchart in the below subsection (Flowchart of the 10-fold Cross-Validation Proces).As part of this approach, a total of 12 ANN models with varying numbers of hidden layers were developed.Overfitting occurs when a model learns from the noise in the data rather than the actual underlying patterns, leading to poor performance on unseen datasets.To mitigate this, K-fold cross-validation is employed.The flowchart in (Fig 3) illustrates this process in a concise manner.The flowchart, depicted in (Fig 4), presents a detailed view of the neural network model training and evaluation process utilizing 10-fold cross-validation.The process begins with 'Start' and is followed by the 'Load dataset' step, where the initial dataset is loaded for analysis.Following this, a 'Preprocess' stage involves scaling the features to ensure they are normalized for optimal model performance.

Neural network modelling process
This study encompassed the training and testing phases in the neural network modeling procedure.To enhance prediction accuracy and expedite model convergence, it was imperative for the data to be normalized within a specific range.The min-max normalization strategy was employed to ensure that both input and target values resided within the [0, 1] range, which is optimal for the activation function's performance [47,48].
During the training phase, adjustments were made to the model's synaptic weights to align with the optimal number of neurons in the hidden layer.Additionally, the training dataset was subdivided into "K" subsets using the K-fold cross-validation method.This approach facilitated the determination of the appropriate number of iterations, or "epochs," required before concluding the model's training.
Following the training, the model's accuracy and predictive capacity were evaluated using a testing dataset.This phase enabled the neural network model to learn from the data and predict future instances of MPXV in the selected countries.

Evaluating the performance of the neural network models
The training process repeatedly conditions the neural network models to understand the relationship between input and output.LM learning method (refer to Eqs (4) and ( 5)) was employed during this phase.The model's performance was evaluated using the Root Mean Squared Error (RMSE) and the Coefficient of Determination (R 2 ).RMSE is the square root of the average squared differences between actual values and model output, whereas R 2 is a measure of how well the model fits the data.A model is considered to fit well when R 2 is close to 1.0 and RMSE approaches zero [49,50].
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Here, n represents the number of values, Ŷ i the predicted values, Y i the actual values, and � Y the mean of all values.Additional metrics such as Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were also utilized to assess the model's performance [51].
In these equations, MAE signifies the Mean Absolute Error, and MAPE the Mean Absolute Percentage Error, providing further insight into the model's accuracy.
K-fold cross-validation was utilized to mitigate overfitting in our neural network models.Training was concluded when a significant reduction in RMSE was no longer observed with an increase in epochs.This method ensured effective learning without overfitting.
The Levenberg-Marquardt optimization technique was crucial in determining when to stop training.It balanced convergence speed and model accuracy, preventing excessive training iterations and ensuring optimized model performance.
For LSTM and GRU models, training stop criteria included monitoring validation loss.Training was halted if validation loss stopped decreasing or started increasing.Early stopping was implemented, where training ceased after a pre-set number of epochs without improvement in validation loss.This prevented learning noise and ensured better generalization.Other hyperparameters like learning rate and batch size were also considered.Specific thresholds for early stopping based on validation loss changes were crucial for optimizing model training.

Peculiarities of applied methodologies
In our exploration of epidemiological forecasting, particularly in modeling the spread of monkeypox, this study introduces a novel approach through the application of Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models.These methodologies have been meticulously selected based on their demonstrated efficiency in capturing complex nonlinear relationships and temporal dependencies within time-series data, essential attributes for the accurate prediction of disease trends.
The distinctiveness of the methodology lies in the comprehensive adaptation and fine-tuning of these models to cater specifically to the challenges presented by infectious disease data, which is often marked by its variability and unpredictability.By employing a comparative analysis-a strategy less frequented in the existing literature for the countries and time periods under study-the approach facilitates a deeper understanding of each model's strengths and limitations in forecasting monkeypox outbreaks.
• Customized Data Preprocessing: The data preprocessing and normalization techniques were specifically tailored to accommodate the unique characteristics of epidemiological data, ensuring that the models are fed input that accurately reflects the dynamics of disease spread.This step is crucial in epidemiological modeling, where the quality of data directly impacts the accuracy of predictions.
• Model Calibration and Validation: The methodological framework includes meticulous calibration of model hyperparameters, such as the number of neurons in hidden layers and learning rates, through an iterative process.This ensures the models are finely tuned to capture intricate patterns within the data.Furthermore, the use of K-fold cross-validation as a robust validation technique helps mitigate the risk of overfitting, a common challenge when dealing with time-series data in machine learning models.
• Advanced Optimization Techniques: The adoption of advanced optimization techniques, such as the LM algorithm for ANN and ADAM for LSTM and GRU models, underlines the uniqueness of the approach.These techniques enhance the learning process, allowing for faster convergence and improved model performance by effectively navigating the complex landscape of the cost function.
• Evaluation Metrics: The selection of comprehensive performance metrics, including RMSE, MAE, and R-squared, further ensuring the accuracy of the methodology.These metrics provide a multifaceted view of model performance, from prediction accuracy to the fit of the model to the data, ensuring a thorough evaluation of each model's ability to accurately forecast disease trends.

Results and discussions
In this section, we delve deeper into the results and provide a more detailed discussion of the predictive performance of three neural network models: ANN, LSTM, and GRU.These models were trained using data from four countries: the USA, Canada, Spain, and Portugal.The period for training data was from June 3 to December 31, 2022, with the evaluation conducted on test data from January 1 to February 7, 2023.The outcomes of this study are illustrated in (Figs 5-7).
Initially, perceptron ANN models with one and two hidden layers were developed.It was observed that one or two hidden layers sufficed for training the ANN for complex nonlinear problems [18,19].This observation aligns with prior studies, including one forecasting dengue fever epidemics in San Juan, Puerto Rico, and the Northwest Coast of Yucatan, Mexico [19].
For network training, the LM algorithm was employed, recognized for its adaptability and efficiency.The LM method, which circumvents the computation of the Hessian matrix, is faster than traditional backpropagation methods.This technique has been successfully applied in other studies, including one that used a genetic algorithm to optimize the parameters of a COVID-19 SEIR model for US states [20].
In (Fig The MSE trend analysis for each country revealed intrinsic differences in data characteristics and model behavior.For instance, the initial spike in MSE for Portugal suggests a phase of rapid learning, where the model aggressively adjusts its parameters to fit the complex data patterns.This phase is critical as it indicates the model's sensitivity to the initial conditions and learning rate.
Subsequent fluctuations in MSE during the training iterations are indicative of the model's continual adaptation process.These fluctuations may arise from various factors, such as the inherent noise in the data or the introduction of new patterns that the model attempts to learn.The stability observed in later iterations across all countries suggests that the models reach a point of equilibrium where learning is balanced with the complexity of the data.Moreover, the nuanced differences in MSE trends between the ANN, LSTM, and GRU models point to the distinct ways these architectures process temporal data.
To determine the optimal number of hidden neurons, the standard approach outlined in the above subsection (K-fold Cross Validation) was followed.A total of 12 ANN models with varying numbers of hidden layers were constructed, as detailed in Tables 1-20.The best model for each scenario was selected based on its evaluation using R 2 , MAPE, and RMSE.Lower values for RMSE and MAPE and higher values for R 2 were indicative of better model performance.
Tables 1-24 present the performance metrics of neural network models trained on data from these countries.Each table contains 11 columns representing specific information: Sl No: Serial number or index of the row in the table.RMSE, MAPE, and R 2 are key metrics for evaluating regression model performance.The tables for each country's dataset cover ANN, LSTM, and GRU models with single and two hidden layers, showcasing the impact of neurons and layers on predictive accuracy and generalization.This comparative analysis aids in selecting the optimal neural network configuration for each dataset.
Each Tables 1-24 is dedicated to a specific type of neural network model (ANN, LSTM, GRU) and considers variations in the number of hidden layers and neurons.The performance of each model configuration is evaluated based on several metrics: RMSE, R 2 , and MAPE.These metrics are calculated for training, validation, and test datasets.For each country's dataset, there are tables corresponding to ANN models with single and two hidden layers, LSTM models with single and two hidden layers, and GRU models with single and two hidden layers.The tables are designed to help in selecting the optimal model configuration for each type of neural network, based on the performance metrics across different datasets.This detailed comparison aids in understanding how the number of neurons and hidden layers in a model can impact its predictive accuracy and generalization capabilities for specific datasets.The data in the tables has been adjusted to display certain values as percentages where relevant.This adjustment is especially useful for metrics like R

Analysis
The evaluation of neural network models for different datasets, as summarized in Table 25 reveals insightful trends and performance benchmarks.
For the Canada dataset, the ANN model with a single hidden layer and 8 neurons and the ANN model with two hidden layers and 3 neurons show commendable performance, particularly in achieving high R 2 percentages and low RMSE values.The LSTM and GRU models, both single and double-layered, also exhibit competitive performance, with the GRU single-   RMSE metrics.This indicates their effectiveness in capturing the underlying patterns in the dataset with a balance of complexity and generalization ability.These results underscore the importance of choosing the right architecture and neuron count in neural network models for different datasets, highlighting the effectiveness of certain configurations in optimizing predictive performance.

Forecasting methodology
In our study, we conducted a detailed forecasting analysis for Canada, Portugal, and the USA using different neural network architectures.The goal was to predict the number of MPXV cases one month ahead, based on the actual reported cases.The accuracy of these forecasts was quantified using the MAPE.For Canada, with 43 actual cases, our models demonstrated varying levels of accuracy.ANN with a single hidden layer predicted 42 cases with a MAPE of 2.3%, showcasing its high precision (Table 26).In comparison, when employing two hidden layers, the ANN model maintained the same MAPE, predicting 42 cases (Table 26).
In Portugal, with 53 actual cases, our ANN models achieved notable accuracy.The singlelayer ANN model estimated 54 cases with a MAPE of 1.9%, while the two-layer ANN model achieved perfect accuracy with a MAPE of 0.0%, predicting 53 cases.For a scenario with 51 actual cases, the two-layer ANN model showed a slight increase in MAPE to 2.0%, estimating 52 cases.The forecasting results for the USA, with 47 actual cases, further highlighted the effectiveness of the ANN models.The single-layer ANN model estimated 50 cases with a MAPE of 6.4%, whereas the two-layer model predicted 48 cases with a reduced MAPE of 2.1%.Across all countries, the ANN models consistently outperformed LSTM and GRU models in terms of accuracy, as reflected in their lower MAPE values.This suggests that ANN architectures, particularly with two hidden layers, are more adept at capturing the trends and nuances in the data, leading to more accurate forecasts for MPXV cases.

Discussion benefits of the results in the wide perspective of industrial production
The findings of this study have significant implications for the practical application in public health management, particularly in the context of infectious disease outbreaks like Monkeypox.The predictive models developed can be integrated into health surveillance systems, aiding healthcare authorities in early detection and response planning.This proactive approach is crucial for effective disease management, enabling timely interventions such as targeted vaccinations and public health advisories.
Moreover, the methodology and results can be adapted for forecasting other infectious diseases, demonstrating the versatility of the approach.This adaptability is particularly beneficial for regions where healthcare resources are limited, as it allows for strategic allocation of resources based on predicted outbreak patterns.Such data-driven strategies can optimize the use of medical supplies, personnel, and facilities, enhancing the overall efficiency of healthcare systems.
In addition, the study's approach can be instrumental in guiding policy decisions, such as travel advisories or quarantine measures, by providing accurate forecasts of disease spread.This is especially relevant in the context of global health, where the mobility of populations can significantly impact the dynamics of infectious diseases.
Furthermore, the potential for collaboration with industries involved in healthcare technology cannot be overlooked.The integration of advanced neural network models into health tech solutions can pave the way for more sophisticated disease tracking and prediction tools, contributing to the larger goal of global health security.

Conclusion
This study presented a comprehensive analysis of three different neural network models-ANN, LSTM, and GRU-for predicting the spread of MPXV in the USA, Canada, Spain, and Portugal.Our findings demonstrated that while each model has its strengths, certain models outperformed others in specific scenarios.
For instance, the ANN model exhibited superior performance in terms of lower RMSE and higher R2 values compared to the other models, particularly in predicting short-term trends.Also, LSTM and Gru showed great accuracy in predictions.The ANN model, while more sophisticated than LSTM and GRU, but LSTM and GRU still provided valuable insights.
Quantitatively, the ANN model achieved an average RMSE and an R2 in predicting cases over a 1-month horizon, outperforming the LSTM's RMSE and R2, and the GRU's RMSE and R2.These results highlight the potential of utilizing advanced machine learning techniques in epidemiological forecasting.
The study's methodology, while robust, has certain limitations.The accuracy of the neural network models, including LSTM and GRU, hinges on the quality and completeness of the epidemiological data, which may have gaps or inaccuracies.The complexity of these models can also lead to overfitting, limiting generalizability to new data or scenarios.Moreover, The model's predictions are based on past data and may not account for future changes in virus behavior, public health policies, or other unforeseen factors.
To address the limitation of machine learning models' inability to extrapolate beyond the conditions of the study, one solution is to incorporate a diverse and comprehensive dataset that covers a wide range of scenarios.This can help the model learn various patterns and improve its generalizability.Additionally, employing techniques like transfer learning, where a model trained on one task is fine-tuned for another related task, can help in adapting the model to new conditions.Regular updating and retraining of the model with new data as it becomes available can also ensure the model remains relevant and accurate over time.Furthermore, combining machine learning models with domain-specific knowledge and expert insights can enhance the model's applicability to new conditions.
The methods utilized in this study, specifically ANN, LSTM, and GRU, are not only theoretically robust but also practically applicable in scientific research.Their adaptability to analyze complex data patterns makes them invaluable tools in epidemiological studies, such as forecasting infectious disease spread.These models can handle large-scale data efficiently, identifying underlying trends and making accurate predictions.This capability is crucial for public health officials and researchers in planning interventions and making informed decisions based on predictive analytics.

Figs 1 and 2 (
Fig 1a shows the distribution of confirmed cases, while Fig 1b provides a geographical representation on a global map).Additionally, the sequence of confirmed MPXV instances, detailed with peak intervals from June to October 2022 for Canada, Portugal, Spain, and the USA, are illustrated in Fig 2a-2d.

Fig 2 .•
Fig 2. (a) Sequence of confirmed MPXV instances in Canada, with a detailed view of the peak interval (June to October 2022), (b) Sequence of confirmed MPXV instances in Portugal, with a detailed view of the peak interval (June to October 2022), (c) Sequence of confirmed MPXV instances in Spain, with a detailed view of the peak interval (June to October 2022), (d) Sequence of confirmed MPXV instances in the USA, with a detailed view of the peak interval (June to October 2022).https://doi.org/10.1371/journal.pone.0300216.g002

Fig 3 .Fig 4 .
Fig 3. Flowchart of the 10-fold cross-validation process.https://doi.org/10.1371/journal.pone.0300216.g003 5), the training performance of the ANN model for MPXV over iterations, as measured by MSE.Each line represents one of the four countries, with the MSE values plotted against the number of iterations.The training process of the neural network models is characterized by several distinct phases, as evidenced by the MSE trends for each country.Initially, there is a noticeable spike in MSE for Portugal, indicative of the model's rapid learning and calibration to correct early inaccuracies.As the iterations progress, the MSE for all countries demonstrates convergence towards lower values, suggesting an improvement in the model's predictive accuracy on the training dataset.Despite this overall trend, the MSE experiences fluctuations, potentially reflecting the model's adjustments to diverse patterns within the data.Notably, the MSE lines for Portugal, Spain, Canada, and the United States exhibit comparative stability, with Portugal's model showing consistently lower MSE values, hinting at a better performance for Portugal data relative to the other countries.The (Fig 6) shows the LSTM model's training performance for MPXV, with MSE used as the evaluation metric.Similar to Fig 5, the convergence of MSE values can be seen.The LSTM model for Portugal demonstrates a unique trend with a slight increase in MSE at the later iterations.The GRU model's training progression for MPXV is captured in Fig 7, with MSE again serving as the performance metric.All countries show a rapid decrease in MSE initially, followed by a plateau.Notably, the GRU model for Portugal shows the most consistency in MSE values across iterations.Despite these fluctuations, a convergence towards a stable MSE range is observed for all countries, indicative of effective learning.In (Figs 5-7), the training performance of the ANN, LSTM and GRU models for MPXV over iterations is showcased, as measured by MSE.The detailed dynamics of this training process, including the specific learning curves for the ANN model across the four studied countries, are further elaborated in (Figs 8-10), highlighting the reduction in loss over epochs.

Fig 5 .
Fig 5. Iteration-dependent evolution of the ANN model's training performance for MPXV, evaluated using the mean squared error (MSE) metric.https://doi.org/10.1371/journal.pone.0300216.g005 Neurons: The count of neurons in the neural network's hidden layer.RMSE (Train): The model's RMSE on the training dataset, multiplied by 1000 for scale.R 2 (Train): Coefficient of determination for the model on the training dataset, expressed as a percentage.MAPE (Train): Model's MAPE on the training dataset, expressed as a percentage.RMSE (Validation): RMSE of the model on the validation set, scaled by 1000.

Fig 7 .
Fig 7. Iteration-dependent evolution of the GRU model's training performance for MPXV, evaluated using the mean squared error (MSE) metric.https://doi.org/10.1371/journal.pone.0300216.g007 2 and MAPE, along with other ratio-based figures.Furthermore, to avoid an abundance of decimal places and to improve clarity, the RMSE values have been scaled up by a factor of 1000.Fig 10 presents the learning curves for GRU models across four different countries: Canada, Portugal, Spain, and the United States.Each model's training process, represented by the blue line, shows a reduction in loss over epochs, indicating effective learning.Notably, the Canadian and United States models demonstrate a pronounced decrease in training loss, whereas the validation loss for Portugal remains notably stable, suggesting consistent model performance.The Spanish model's validation loss exhibits more variability, potentially highlighting challenges in generalization.No apparent signs of overfitting are observed within the range of epochs presented, as the validation losses do not trend upwards.Overall, the models

Fig 8 .
Fig 8. Learning curves for ANN models across four different countries: Canada, Portugal, Spain, and the United States.The training process is represented by the blue line and the validation process by the red line, with the reduction in loss over epochs indicating effective learning.https://doi.org/10.1371/journal.pone.0300216.g008

Fig 9 .Fig 10 .
Fig 9. Learning curves for LSTM models in Canada, Portugal, Spain, and the United States.Each subplot shows the training loss (blue line) decreasing over epochs, indicative of the model's learning capacity, while the validation loss (red line) presents fluctuations, reflecting the model's generalization to new data.Notable is the slight convergence between the two losses, suggesting a balance between learning and model complexity.https://doi.org/10.1371/journal.pone.0300216.g009

Table 1 . Identification of the most suitable ANN configuration with a single hidden layer for the Canada dataset.
https://doi.org/10.1371/journal.pone.0300216.t001

Table 4 . Determination of the best LSTM configuration with two hidden layers for the Canada dataset. Neurons RMSE (Train) x 1000 R 2 (Train) % MAPE (Train) % RMSE (Validation) x 1000 R 2 (Validation) % MAPE (Validation) % RMSE (Test) x 1000 R 2 (Test) % MAPE (Test) %
In the context of the Portugal dataset, the ANN single-layer model with 7 neurons stands out, especially in training performance.For the double-layer models, all three types of neural

Table 7 . Identification of the most suitable ANN configuration with a single hidden layer for the Portugal dataset.
neuron each exhibit impressive R 2 percentages, particularly in the validation and test phases, indicating strong predictive accuracy.The Spain dataset shows a similar pattern where the ANN single-layer model with 8 neurons excels in both training and testing phases.In the two hidden layers scenario, the ANN model with 11 neurons and the LSTM model with 5 neurons are noteworthy for their high R 2 values and low RMSE scores, suggesting a robust model performance.For the USA dataset, the single-layer ANN model with 5 neurons and the double-layer ANN model with 12 neurons show superior performance, particularly in terms of R 2 and